Value Prediction Network

نویسندگان

  • Junhyuk Oh
  • Satinder Singh
  • Honglak Lee
چکیده

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Solar Radiation Prediction for Makurdi, Nigeria Using Feed Forward Backward Propagation Neural Network

The optimum design of solar energy systems strongly depends on the accuracy of  solar radiation data. However, the availability of accurate solar radiation data is undermined by the high cost of measuring equipment or non-functional ones. This study developed a feed-forward backpropagation artificial neural network model for prediction of global solar radiation in Makurdi, Nigeria (7.7322  N lo...

متن کامل

Intelligent prediction of heating value of coal

The gross calorific value (GCV) or heating value of a sample of fuel is one of the important properties which defines the energy of the fuel. Many researchers have proposed empirical formulas for estimating GCV value of coal. There are some known methods like Bomb Calorimeter for determining the GCV in the laboratory. But these methods are cumbersome, costly and time consuming. In this paper, m...

متن کامل

Accuracy of Pediatric Emergency Care Applied Research Network Rules in Prediction of Clinically Important Head Injuries; A Systematic Review and Meta-Analysis

Objective: the present meta-analysis was designed to determine the value of Pediatric Emergency Care Applied Research Network (PECARN) rule in prediction of clinically important traumatic brain injury (ciTBI).Methods: Extensive search was conducted in the databases of Medline, Embase, Scopus, Web of Sciences, Cinahl up to the end of August 2017. The search records were screened and summarized b...

متن کامل

Flow Variables Prediction Using Experimental, Computational Fluid Dynamic and Artificial Neural Network Models in a Sharp Bend

Bend existence induces changes in the flow pattern, velocity profiles and water surface. In the present study, based on experimental data, first three-dimensional computational fluid dynamic (CFD) model is simulated by using Fluent two-phase (water + air) as the free surface and the volume of fluid method, to predict the two significant variables (velocity and channel bed pressure) in 90º sharp...

متن کامل

Investigation of Possibility of Suspended Sediment Prediction Using a Combination of Sediment Rating Curve and Artificial Neural Network Case Study: Ghatorchai River, Yazdakan Bridge

Estimation of sediment loads in rivers is one of the most important, difficult components of sediment transport studies and river engineering. Accessing new methods that can be effective in this background are more important. In this research, we have used the artificial neural network (ANN) to optimize the results of the sediment rating curve (SRC) to predict the suspended sediment loads. For ...

متن کامل

Comparison of artificial neural network and multivariate regression methods in prediction of soil cation exchange capacity (Case study: Ziaran region)

Investigation of soil properties like Cation Exchange Capacity (CEC) plays important roles in study of environmental reaserches as the spatial and temporal variability of this property have been led to development of indirect methods in estimation of this soil characteristic. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017